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Abstract — Sina Weibo, which was launched in 2009, is the most 
popular Chinese micro-blogging service. It has been reported that 
Sina Weibo has more than 400 million registered users by the 
end of the third quarter in 2012. Sina Weibo and Twitter have 
a lot in common, however, in terms of the following preference, 
Sina Weibo users, most of whom are Chinese, behave differently 
compared with those of Twitter. 

This work is based on a data set of Sina Weibo which 
contains 80.8 million users' profiles and 7.2 billion relations and 
a large data set of Twitter. Firstly some basic features of Sina 
^ Weibo and Twitter are analyzed such as degree and activeness 
i!^ distribution, correlation between degree and activeness, and the 
degree of separation. Then the following preference is investigated 
^ by studying the assortative mixing, friend similarities, following 
distribution, edge balance ratio, and ranking correlation, where 
i_i edge balance ratio is newly proposed to measure balance property 
of graphs. It is found that Sina Weibo has a lower reciprocity 
^0 rate, more positive balanced relations and is more disassortative. 
^ Coinciding with Asian traditional culture, the following prefer- 
^ ence of Sina Weibo users is more concentrated and hierarchical: 
I— ' they are more likely to follow people at higher or the same social 
levels and less likely to follow people lower than themselves. 
In contrast, the same kind of following preference is weaker in 
^ Twitter. Twitter users are open as they follow people from levels, 
which accords with its global characteristic and the prevalence of 
western civilization. The message forwarding behavior is studied 
by displaying the propagation levels, delays, and critical users. 
The following preference derives from not only the usage habits 
but also underlying reasons such as personalities and social 
I moralities that is worthy of future research. To the best of our 
knowledge, this is the first comparative work focusing on the 
I following behavior using both large-scale data set of a global 
• • and a Chinese local online social networks. 
> 

• Index Terms — nline social network. Twitter, Sina Weibo, Fol- 

lowing preference, PageRank, Assortative, User Behaviors, Edge 
^ Balance Rationline social network. Twitter, Sina Weibo, Following 
preference, PageRank, Assortative, User Behaviors, Edge Balance 
RatioO 



I. Introduction 

Twitter, a world-wide popular online microblogging service has 
reached a commercial success and attracted many researchers' 
attention. Due to some reasons. Twitter is hard to be accessed 
in mainland China. Many Chinese local micro-blogging services 
sprang up around 2009 including Sina Weibo, Tencent Weibo, 
and Sohu Weibo. Sina Weibo is the most popular one with more 
than 400 million users by the end of the third quarter in 2012. 
"Weibo" is the Chinese word for "microblog". Sina Weibo and 
Twitter share some basic features. One user can "follow" any 
other user to become his or her follower without any verification 
or approval. Users can post short messages (called tweets for 
Twitter and weibos for Sina Weibo) within certain length on their 
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main pages and then all their followers will receive the messages. 
Tweets contain only text but weibos allow pictures and videos 
to be attached. Messages can be forwarded or "retweet". This 
mechanism enhances the power of message propagation and one 
message is able to cover a very large range in a very short time. 
Although Sina Weibo has already opened up its overseas markets 
to users from other countries, almost all the Sina Weibo users 
now are Chinese, while Twitter users are globally distributed 
except certain countries block it. Such difference in constituency 
results in the macro differences between Sina Weibo and Twitter. 

Researches have shown that distinguishes from other online 
social networks like Facebook and MySpace, the unique uni- 
directional "follow" mechanism makes twitter has another role 
of social media. Because people use Twitter not only to follow 
their friends but also online celebrities and organizations in 
order to get news and gossips. The network structures of Sina 
Weibo and Twitter are very complex. However, the features of 
network structure at macro level are directly caused by every 
user's following preference at micro level. It becomes essential 
to figure out whether all the users have the similar following 
preference and how users' attributes such as nationalities and 
cultural backgrounds influence the choice about following whom. 
This is the main purpose of this paper. 

In this work, large-scale data set is used to comparatively 
analyze both Sina Weibo and Twitter. The data set of Sina Weibo 
contains 80.8 million users' profiles and 7.2 billion relations that 
covers about 20% number of all users. The data set of Twitter is 
from 1 13 1 and it contains 41 million users and 1.5 billion relations. 
Firstly some basic features of Sina Weibo and Twitter are 
analyzed such as degree and activeness distribution, correlation 
between degree and activeness, and the degree of separation. 
Then the following preference is investigated by studying the 
assortative mixing, friends' similarity, following distribution, and 
edge balance ratio, where edge balance ratio is newly proposed to 
measure a graph's balance property |24|. Sina Weibo and Twitter 
users are ranked by the number of followers and PageRank. The 
ranking correlation reflects the diversity of following preference. 
Based on the following preference and the social media role 
of micro-blogging services, the message forwarding behavior is 
studied by displaying the propagation levels, delays, and critical 
users. The contribution of our work reveals the difference of 
following preference between Sina Weibo and Twitter. To the best 
of our knowledge, this is the first comparative work focusing on 
the following behavior using both large-scale data set of a global 
and a Chinese local online social networks. 

The rest of this paper is organized as follows. The related 
works are briefly reviewed in Section [ll] In Section |III| some 
basic analysis is performed on Sina Weibo and Twitter. The main 
contribution of this work is contained in Section |IV| where the 
following preference is compared between Twitter users and Sina 
Weibo users. In Section [Vj users are ranked by the number of 
followers and PageRank. In Section |VI| two real examples are 
displayed. The propagation level, delay, coverage, pattern, and 
the important users in the process are studied. The conclusion 
goes in Section Ivn] 
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II. Related works 

Twitter has been studied widely. Java et al. studied the 
topological and geographical properties of Twitter 1 10 1. Kwak et 
al. presented a quantitative study on the entire Twittersphere and 
information diffusion 1 13 |. They used basic statistical methods to 
analyze topological features and found non-power-law distribu- 
tion, a short effective diameter, and low reciprocity for Twitter, 
which indicates the differences from human social networks. 
Then it is concluded that Twitter is a news media more than 
a social network. Krishnamurthy et al. presented a detailed 
characterization of Twitter and studied user behaviors including 
following, geographic growth patterns, and other aspects |12|. 
Through comparing three different measures of influence: in- 
degree, retweets, and mentions, Cha et al. found the number of 
followers is not related to the number of retweets and mentions 
j^. Yang et al. studied the hashtag used in tweets propagation and 
reported the hashtag in Twitter play a dual role, a bookmark 
of content and the symbol of a community membership |27|. 
Kitsak found most efficient spreaders are those located within 
the core of the network by the k-shell decomposition analysis 
1 11 1. Java et al. compared some of network properties of Twitter 
using users' profiles from different continents |10|. They found 
users in Europe and Asia tend to have higher reciprocity and 
clustering coefficient values in their corresponding sub graphs. 

For other online social networks, Flickr, Livejournal, Orkut, 
and YouTbbe are studied in |19|. Ahn et al. researched on the 
topological characteristics of Cyworld, MySpace, and orkut 1 1 1. 
They examined average degree, average clustering coefficient, 
assortativity, degree of separation, and other properties of these 
online social network services. 

Micro-blogging services in China experienced rapid growth. 
It is believed that Sina Weibo, the most popular one in China, 
may exceed Twitter in the number of users due to the huge 
Chinese netizen base. However, there are few quantitative works 
on Sina Weibo and the difference between these two micro- 
blogging magnates. Gao et al. studied users' basic behaviors 
such as access ways, writing style, topics, and interest change. 
They analyzed more than 40 million micro-blogging activities 
but didn't involve following relations |7|. Yin et al. studied 
the patterns of advertisement propagation in Sina Weibo |29|. 
They extracted propagation features such as volume, topology, 
and time then used K-means clustering algorithm to group the 
messages. 

Our work of following preference is also related with link 
analysis. Chen et al. studied friend recommendations designed to 
help users find known, off-line contacts and discover new friends 
on social networking sites |4|. Hopcroft proposed a machine 
learning model to study the two-way relationship prediction 
in social network |9|. Yang et al. analyzed the structure the 
spammers' networks that they marked on Twitter and found 
following preference inside the spammers' networks |26|. They 
found the criminal accounts tend to form a small-world network 
and the criminal hubs prefer to follow criminal accounts. Ghosh 
et al. found the link farming strategy that spammers use is begun 
with following social capitalists who are popular and prefer to 
follow back anyone who connects to them (8). 

III. BASIC ANALYSIS 

In this section, some basic analysis is performed to Sina Weibo 
to study its network topology and other features before delving 
into the following preference analysis, since Sina Weibo hasn't 
been widely studied as Twitter. The results are compared with 
those of Twitter. 

A. Degree Distribution 

The network structure of Sina Weibo and Twitter can be 
modeled as directed graphs. Each node has nodes linking to it and 




10° 10' 10^ 10^ lo" 10^ 10*^ 10^ 10* 
ln-degree(number of followers) 

(a) Distributions of followers 




10° 10' 10^ 10' 10* 10' 10° 
Out-degree(number of followings) 



(b) Distributions of followings 

Fig. 1. Distributions of in/out-degree (foUowers/followings) of Sina Weibo 
and Twitter. 



nodes it links to, corresponding to the followers and followings. 
Figure [l] displays the distributions of in-degree (followers) and 
out-degree (followings) of Sina Weibo and Twitter, respectively. 
The X-axis represents the number of followers or followings 
and the y-axis represents complementary cumulative distribution 
function (CCDF). 

There are some glitches in Figure [l] Firstly, in Figure |l(b)| 
there are a rapid decrease of the solid line and a slight but 
noticeable decrease of the dashed line, where the number of 
followings is around 2,000. Sina Weibo limits the maximum 
number of followings and only few VIP members can break 
the upper bound. The limit was also available for Twitter but 
removed after 2009. 

Secondly, the tails of dashed lines represent the online celebri- 
ties such as actors/actresses, TV show hosts, musicians or singers, 
and news media. Besides, the proportion of them is higher than 
power-law distribution predicts in Twitter but this characteristic 
is not found in Sina Weibo. It reflects the global coverage property 
of Twitter, where world celebrities gather, while Sina Weibo is 
locally used. 

Thirdly, there exists a gap between solid line and dashed line 
in Figure [T(a)l The solid line is above the dashed one, where the 
number of followers is larger than 10. It indicates that Twitter 
has more users with fewer than 10 followers than Sina Weibo. 

In Figure |l(a)| both the solid and dashed lines approximately 
fit the power-law distribution. The power-law coefficient of the 
solid line is 2.3336 and that of the dashed line is 2.1363. Many 
previous researches reported that most social networks have a 
power-law distribution. The in-degree power-law coefficient of 
Twitter is reported as 2.276 in |13| and 2.4 in |10J. Mislove et 
al. reported the results for other social networks in (19) . The in- 
degree power-law coefficient of Flickr is 1.78, that of Livejournal 
is 1.65, and that of YouTube is 1.99. 

B. Activeness 

The number of micro-blogs, which are called "tweets" on 
Twitter and "weibos" on Sina Weibo, is a measure of activeness. 
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In fact how many followers one user has can't measure the 
activeness: a pop star updating his Sina Weibo occasionally is 
obviously less active than a wordy normal person posting tens of 
weibos every day; however the pop star has much more followers. 
Figure |2] displays the distribution of weibos, which is a two-stage 
power-law distribution with heavy tails. The curvelits a power-law 
distribution with exponent of 1.1897, where the number of weibos 
is fewer than 1, 000. The curve fits a power-law distribution with 
exponent of 2.7101, where the number of weibos is in the range 
of 1, 000 to 10, 000 . Heavy tails represent a very small number 
of users who have posted tens of thousands of weibos. 

As the number of weibos is a measure of user's activeness, 
the two-stage power-law distribution in Figure [2] shows that 
activeness is distributed differently from followers and foUowings. 
Please recall the power-law coefficient in Figure |l(a)| is 2.3336. 
At the first stage of the distribution, the weibos' power-law 
coefficient is smaller than that of followers and foUowings, which 
indicates activeness is easier to accumulate at low level. While at 
the second stage, the coefficient becomes larger, which indicates 
that it becomes hard to maintain activeness at high level. 

To gauge the statistical correlation between them, the number 
of weibos (y-axis) against that of followers (x-axis) is plotted 
in Figure |3(a)[ Figure |3(a)| shows a positive correlation but "+" 
disperses when the number of followers increases. In Figure [3(a)l 
the dashed line has an inflection at the point A. Before that point, 
the number of weibos that a user posts is around 7 times the 
number of followers he has. The correlation becomes weaker 
beyond the point A. However, the mean of the "+" in log scale 
still keeps a slow growth. 

Besides, the correlation between users' weibos and foUowings 
is plotted in Figure |3(b)| The irregularity around the point B 
is because of the recommended system. As soon as a new user 
registers, Sina Weibo will recommend a set of users for him to 
follow which results in many users carelessly have around 50 
foUowings initially. The cut-off around the point C is due to the 
upper bound limit of foUowings that is also observed in Figure [1] 
Compared to Figure |3(a)| the correlation between activeness and 
the number of foUowings shows similar features. It is reported 
that the correlation between the number of foUowers/followings 
and the number of tweets also shows a positive trend in Twitter 

C. Degree of Separation 

The concept of degree of separation came from Stanley 
Milgram's famous "six degree of separation" experiment |18| 
which concludes any two people in the world can be connected 
by no more than six people on average. Ever since then, this 
experiment is tested on various networks. Watts and Strogatz 
proposes the "small-world" model in |25| to model networks 
with small degree of separation. 

Two users are called friends when they follow each other on 
Twitter or Sina Weibo. In fact only 36.2% of relations on Twitter 
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Fig. 3. The number of weibos and that of followers/followings of Sina Weibo 
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and 20.3% of relations on Sina Weibo are reciprocal. Sina Weibo 
has an even lower reciprocity rate, which emphasizes the social 
media role of Sina Weibo. The friend preference will be studied 
in next section. The degree of separation is studied on friends' 
network. 

The general method to find the shortest path length over graph 
is the Dijkstra algorithm. Considering the time complexity of 
standard Dijkstra algorithm is 0{N^), where N is the number 
of nodes, it is unacceptable for Sina Weibo and Twitter with 
millions users. Snowball sampling |14| is used to reduce the 
time complexity and to obtain an approximate result. Snowball 
sampling randomly picks seed nodes, performs a breadth-first 
search and a list of nodes marked with the distance from the seeds 
is obtained. Counting the nodes at each distance gives a histogram 
of the path length. Distributions of the shortest distance from the 
seed is plotted in Figure|4]with 6, 000, 7, 000, and 8, 000 seeds for 
both Sina Weibo and Twitter. The distributions almost overlap 
completely as the number of seeds increases that means these 
seeds are enough to estimate average distance. 

The average distance between arbitrary two users is 4.86 for 
Twitter and 4.63 for Sina Weibo. The effective diameter |15| of 
graph is defined as the 90th percentile distance and it is 5.89 
for Twitter and 5.06 for Sina Weibo. Compared with former 
researches, Kwak et al. reported the average distance is 4.12 for 
their data set of Twitter 1 13 1. Other online social networks are 
also analyzed: the average distance is 5.67 for Flickr, 4.25 for 
Orkut, and 5.10 for YouTube |2| |20|. The average distance of 
Sina Weibo and Twitter is quite short for the size of them. Enough 
if friend relations are minor part of the total relations, the short 
average distance reflects the entire network is tightly connected. 
With a smaller degree of separation and effective diameter, it 
is suggested that the network structure of Sina Weibo is even 
tighter and more complex. 
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Fig. 5. Number of average followers and registered time of Sina Weibo. 



IV. FOLLOWING PREFERENCE 

In this section, tlie following preference of Sina Weibo and 
Twitter users is analyzed. The assortative mixing, friends' simi- 
larity, following distribution, and edge balance ratio are studied 
to reveal the difference in structure at macro level and underlying 
reasons of users' behaviors. 

A. Attractive Features for Chinese Users 

The following preference of the followers determines users with 
certain features will attract more followers than others. Analyzing 
the attractive features helps to deduce the following preference. 
In our data set, each Sina Weibo user has 89.7 followers on 
average. Male users take up 52.46% of the total. Each male 
user has 87.6 followers on average while the number for female 
users is 92.0. Sina Weibo introduced a certification system to 
reduce fake information. Users apply to the system to become 
verified users or "v-user" for showing their real identity on their 
homepages. Even if verified users take up only 0.34% of the total 
users, each of them has 9,455.0 followers on average, in contrast 
to that, unverified users only have 57.5 followers, which is much 
fewer than the global average. 

The registered time of a user also affects how many followers 
he can attract. Figure |5] displays the average number of followers 
against registered time. The x-axis represents the elasped time 
in month since Sina Weibo opened in August, 2009. The left-side 
y-axis represents the number of registered users each month. The 
right-side y-axis represents the average number of followers of 
the users who register in the corresponding month. The number 
of registered users in each month increases over time and the 
earlier registered users have more followers. It should be noted 
that when Sina Weibo opened, most of the first group of users are 
celebrities and it causes the high starting point. Another reason 
may come from the recommendation mechanism mentioned in 
last section. 
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Fig. 6. Number of followers of user's friends and that of himself of Sina 
Weibo and Twitter. 



B. Friends' Similarity 

Sociology researches (17) reported homophily in social net- 
work. Homophily is the preference of people to associate with 
ones similar to themselves. In this sub section, the following 
preference in friends' network is analyzed. Two aspects are 
considered for measuring users' similarity: geographic region 
and fame. 

Users in the same city or province might know each other off- 
line. In our statistics, there are 621 million pairs of friends and 
47.3% of them are in the same province. Twitter doesn't have a 
standard format for geographic information so it is hard to parse 
the users' self-written location. Time zone is used to represent 
the location and obtains the conclusion that Twitter users with 
fewer than 2,000 friends are likely to be geographically close 
|13|. 

The probability of a popular pop star following a general user 
is much less than following another popular star. The number 
of followers is a measure of the fame. Figure [6] plots the median 
of followers of a user's friends and that of himself. Every "+" 
represents the median of followers of friends over all the users 
with the same number of followers. The dashed line stands for 
the mean in log scale. In Figure [6j there are both significant 
positive correlation between the number of followers of the 
user's friends and that of himself when the user has fewer than 
1,000 followers. Though the median numbers disperse when the 
number of followers become larger, the correlation is still positive 
when considering the mean in log scale. 

The correlation between the number of followers of friends 
and that of a user himself is similar to degree correlation. 
The difference is that degree correlation usually applies for 
bidirectional graphs and compares the degree of a node with its 
neighbor nodes. The degree correlation describes in the graph a 
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hub is more likely to connect to other hubs or less. The former 
situation is known as a feature of human social networks |22|. 
The preference that people choose who are similar to themselves 
as their friends holds in Sina Weibo and Twitter in terms of 
graphic region and fame. 

C. Following Distribution 

The following preference of users with different number of 
followers can be studied by the distribution of the following 
relations, which is plotted in Figure |7j All the users are divided 
into seven groups based on the number of their followers. The 
boundaries of these groups are shown on the axis. Every circle 
in Figure |7] represents the following relations from users in the 
corresponding "Follower" group to users in the corresponding 
"Followed" group. The area of the circle represents the number 
of these following relations. 

It is concluded that both Sina Weibo and Twitter users prefer 
to follow users who have the similar or more number of followers 
because the circles above the diagonal are larger than those below 
the diagonal. Besides, this kind of following preference is more 
signilicant for Sina Weibo users. However, Figure |7] fails to show 
the following preference of celebrities, because the number of 
them is small. Other measurements will be used in next sub 
sections. 



D. Assortative Mixing 

Assortative mixing |21| or assortativity is a global measure 
of the preference of nodes to connect to similar nodes. For 
undirected networks, degree is always available as a property 
of node to calculate the assortative mixing by degree. For 
directed networks, an approach of assortativity by a set of four 
assortativity measures is introduced in |6|. Let a,/? G {in, out} 
be index of the degree type, and s" and denote the in-degree 
or out-degree of the source node and the target node for edge i. 
The definition of assortativity is given by 



r{a,l3) 



EJ(^f-^)(tf-i^)] 



(1) 



where M is the number of edges, is the average in or out 
degree of the source node, a°' = ^/M"^ ^ii-'^f ~ s^)^. t*^ and 
are similarly defined for target node. Nodes are more likely 
to link to similar nodes if r is more close to 1 and less likely if 
r is more close to — 1. H r is close to 0, it means no significant 
correlation between degrees of source and target nodes. 

A set of four r(a, /3) provides the profile for directed network 
assortativity. Such profile of Sina Weibo and Twitter relation 
networks are plotted in Figure |8] It is well known that social 
networks usually mix assortatively |21|. However, Twitter shows 
slight disassortative property as the four r{a, f3) are all negative 
and they are close to zero. While in Sina Weibo, r {in, out) 



and r{out,out) are positive and the rest are negative. The 
disassortative property is not hard to explain because there exists 
lots of "unbalanced" relations linked from small degree users to 
large degree ones, which makes the on-line network structure 
distinguished from traditional social networks. The remarkable 
difference between Sina Weibo and Twitter in Figure |8] is that 
r{out, in) of Sina Weibo is smaller and r{out, out) is larger. This 
indicates users with small foUowings tend to follow users with 
large followers and small foUowings. Therefore, normal users in 
Sina Weibo have stronger preference to follow people with very 
large number of followers. 

Though assortative mixing is able to provide some macro 
information about the following preference, it is not detailed 
enough because only four scalar values are presented and 
assortative mixing actually make a weighted average over all 
the edges and is a global measurement. A new measurement, 
which was proposed recently, will be used to figure out more 
details in the next sub section. 

E. Edge Balance Ratio 

The unidirectional relations in Sina Weibo and Twitter results 
in the failure of homophily between two connected users because 
they can vary in many aspects such as geographic region, 
job occupation, influence, and fame. Edge balance ratio is a 
measurement to describes this balance property of a directed 
graph |24|. In directed graphs, an edge is not balanced if nodes 
at both ends of the edge are not equivalent in some aspects. 
Edge balance ratio denoted as R describes the balance level of 
a directed edge from node A to node B and is defined as 



dz(B) 



R = 



oo, di(A) = 0; 



(2) 



where di(B) and di{A) are properties of node B and A such as 
in-degree or PageRank. Since i? is a property for every edge, the 
distribution of R measures the balance property of a graph. 

In Sina Weibo and Twitter, the number of followers and 
PageRank are chosen as properties of nodes to calculate edge 
balance ratio. Figure |9] displays the distributions of edge balance 
ratio for Sina Weibo and Twitter. Every curve in Figure |9] reaches 
its maximum value, where R equals one and its right side is 
higher than its left side. The dashed lines representing Twitter 
have two other local maximum values, where R is around 10^ 
and 10^^, while the solid lines representing Sina Weibo maintain 
monotonicity at both sides of where R equals one. The positions 
of solid and dashed lines indicates that Twitter has higher 
proportion of edges of small R than Sina Weibo. Besides, Twitter 
has lower proportion of edges, whose R is bigger than one, than 
Sina Weibo except for the positions, where local maximum values 
occur. 

The edge balance ratio determines the type of relations in the 
network. 
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Fig. 9. Edge balance ratio of Sina Weibo and Twitter. 



(1) The relations with edge balance ratio far larger than one 
reflect users' hope to obtain news, gossips, or other type 
of messages from celebrities with large influence and high 
reputation. This is most users' important purpose to use 
online micro-blogging services. 

(2) The relations with edge balance ratio close to one reflect 
users' needs to keep connections with friends who usually 
are in the same level. The homophily tells us people tend to 
associate with ones similar to themselves. 

(3) The relations with edge balance much less than one contain 
rich hidden information and reveal the unique following 
preference. 

Figure [9] shows that in Twitter there are more the third type 
edges. This result indicates the network structure of Twitter 
might be less hierarchical than Sina Weibo, where more highly 
ranked users seldom foUow common users. 

F. Summary 

The following preference is summarized as follows: 

• If only friends' relations are considered, both Sina Weibo 
and Twitter exhibit some level of homophily. 

• Both Sina Weibo and Twitter users prefer to follow users 
who have the similar or more number of followers. 

• Both Sina Weibo and Twitter have disassortative proprety, 
besides, Sina Weibo is more disassortative than Twitter. 

• There are more the third type of edges in Twitter than 
Sina Weibo. The network structure of Twitter might be less 
hierarchical than Sina Weibo. 

It is suggested that these results come from the different 
personalities and social moralities for Chinese Sina Weibo users 
and Twitter users. Chinese are more prudent when choosing the 
people to follow and they are more hierarchical and think it is 
inappropriate with their status to follow people with low social 
level. While Twitter users are more open to follow different kinds 
and levels of people. 



V. RANKING USERS 
In this section, users are ranked by the number of followers 
and PageRank. Ranking gives a clear description about how 
important a user is. The ranking correlation reflects the following 
preference of top users. 

A. Number of Followers v.s. PageRank 

Ranking users by the number of followers is a simple and 
directed method but it can't reflect one's influence. PageRank 
is an algorithm proposed to rank web pages |23|. It certainly 
applies for all directed graphs. PageRank doesn't only count the 
links to a web page but also evaluate importance of the linked 
in web page. The basic idea about PageRank is pushing the 
influence propagating through the links and flowing to the most 
influent nodes. 

Figure [lO] shows the top 20 list ranked by the number of 
followers and PageRank for Sina Weibo and Twitter. Two lists 
for both Sina Weibo and Twitter are not exactly the same but 
both share many users whose names are marked in gray. In the 
lists of ranking by the number of followers, we flnd that actors, 
actresses, show hosts, and singers occupy most positions of the 
lists. In lists of ranking by PageRank, some services, news media, 
and politicians get higher ranks, for instance, the official service 
and official iPhone client service for Sina Weibo, CNN breaking 
news, and Barack Obama. Official services, news media, and the 
president of U.S. are more influential since people with many 
followers also have the preference to follow them thus the result 
is reasonable. 

B. Ranking Correlation 

Top 20 lists are not enough to measure the correlation between 
these two rankings. In this sub section, the ranking correlation 
is studied using generalized Kendall's tau |5|. If denoting two 
ranking top k list as ri and t2, i and j are elements in n or t2, 
the "optimistic approach" to Kendall's tau is deflned as 



i^<''^(ri,T2) 



E 



K,j (Tl,T2) 



(3) 



The value of Ki°- (n ,T2) is divided into three categories. 



(1) i and j are in both lists. If i and j are in the same order, 

K''ij{Ti,T2) — 0; otherwise ic'j (ri,r2) — 1. 

(2) Both i and j appear in one list and only i appears in the 

other list. H i is ranked higher than j, kI'^j [ti , T2) = 0; 



otherwise if ' ^ (n , T2) 



(3) i and j are in different lists, let Kf^j{Ti,T2) — 1. 

The normalized distance K |16| is used to normalize the 
ranking correlation. K is deflned as 



K : 



7^(0) (. 



fe2 



(4) 



where k is the length of the lists. Figure [IT] shows the correla- 
tion between ranking by number of followers and ranking by 
PageRank for Sina Weibo and Twitter. Both the solid line and 
dashed line are above 0.6 and the dashed line representing K 
for Twitter decreases as k becomes large while K for Sina Weibo 
keeps around 0.8. 

It is shown in the last section that users prefer to follow users 
who have more number of followers. The ranking correlation 
reflects the diversity of foUowing preference. If the diversity is 
big, it may happen that important users follow the users with 
few followers or the users with many followers only attract the 
following from unimportant users. In this situation, the top-fc 
ranking correlation will be close to zero. With more top users 
involved, the diversity of following preference in Twitter becomes 
larger but it bigger little in Sina Weibo. It can be concluded that 
the following preference of Sina Weibo users is more concentrated 
than that of Twitter users. 
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Rank 


Sina Weibo 


Twitter 


Ranking by PageRank 


Ranking by number of followers 


Ranking by PageRank 


Ranking by number of followers 


Name 


Remark 


Name 


Remark 


Name 


Remark 


Name 


Remark 


1 


*' Weibo Secretary 


official service 


Chen Yao 


actress 


Ashton Kutcher 


actor 


Ashton Kutcher 


actor 


2 


Kevin Tsai 


show host 


Dee Hsu 


actress/show host 


CNN Breaking News 


news 


Ellen DeGeneres 


show host 


3 


Dee Hsu 


actress/show host 


Nana Xie 


show host 


Barack Obama 


president of USA 


Britney Spears 


musician 


4 


Clien Yao 


actress 


Kevin Tsai 


show host 


Ellen DeGeneres 


show host 


CNN Breaking News 


news 


5 


Top news 


news 


Jiong He 


show host 


Britney Spears 


musician 


Oprah Winfrey 


show host 


6 


Weibo iPhone 


official iPhone service 


Mini Yang 


actress 


Oprah Winfrey 


show host 


Twitter 


offical service 


7 


Xiaogang Feng 


director 


Vicki Zhao 


actress 


SHAQ 


basketball star 


Ryan Seacrest 


show host 


8 


Faye Wong 


singer 


Leehom Wang 


singer 


Twitter 


offical service 


Barack Obama 


president of USA 


9 


Barbie Hsu 


actress 


Barbie Hsu 


actress 


Lance Armstrong 


biking star 


SHAQ 


basketball star 


10 


New Weekly 


magazine 


Libo Zhou 


comedian 


Ryan Seacrest 


show host 


Kim Kardashian 


model 


11 


Jiong He 


TV show host 


Kai-Fu Lee ,^ 


bussiness man 


Jimmy Fallon 


actor 


Demi Moore 


actress 


12 


Kai-Fu Lee 


bussiness man 


Selected Jokes 


jokes 


iamdiddy 


musician 


Jimmy Fallon 


actor 


13 


Sina Entertainment 


entertainment news 


John Tsai 


singer 


Demi Moore 


actress 


iamdiddy 


musician 


14 


Vicki Zhao 


actress 


BingbingLi 


actress 


The New York Times 




Lance Armstrong 


biking star 


15 


Nana Xie 


TV show host 


NBA 


NBA 


Perez Hilton 


blog writer 


The New York Times 


news 


16 


Qi Shu 


actress 


Weibo Secretary 


offical service 


Stephen Fry 


actor 


Coldplay 


musician 


17 


Mini Yang 


actress 


Jianxiang Huang 


sports commentator 


Kim Kardashian 


model 


The Onion 




18 


Leehom Wang 


singer 


Christine Fan 


singer 


The Onion 


news 


Al Gore 


politician 


19 


Shiyi Pan 


bussiness man 


Show Luo 


singer 


Evan Williams 


founder of Twitter 


Ashley Tisdale 


actress/musian 


20 


Christine Fan 


singer 


Qi Shu 


actress 


Kevin Rose 


founder of Digg 


SOcent 


musician 



Fig. 10. Top 20 users ranked by PageRank and the number of followers of Sina Weibo and Twitter. 
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Fig. 1 1 . Ranking correlation of Sina Weibo and Twitter. 



VI. PROPAGATION 

In this section, the message propagation in Sina Weibo is 
studied. It was reported that Twitter is a social media more than 
a social network | I3|. The low reciprocity rate and following 
preference suggest that Sina Weibo has the role of social media 
as well. Two examples are presented to display the propagation 
features and critical users. 




20 

Delay in hours 



(b) Example 2 




20 

Delay in hours 



Fig. 12. Distribution of propagation. 



A. The Characteristics of Propagation of Weibos 

There are many researches about tweets propagation but still 
very few about weibos in Sina Weibo. As far as we know, Yin 
et al. studied the patterns of advertisement propagation in Sina 
Weibo |29|. Yao et al. proposed a provenance model to capture 
connections between micro-blog messages |28|. 

Hot weibos are tracked in order to study their propagation 
features. It is found that most of the hot weibos propagate no 
more than 10 levels, namely the farthest user from the source 
user is apart within 10 hops. Moreover, the hot trend is also found 
to disappear very fast and usually most of them can only stay hot 
for hours and very few can last for days. However, hot weibos 
always show powerful ability to reach a large coverage scale in a 
very short time. A possible explanation is the complexity of the 
network structure which has a short distance between arbitrary 
two users. In this sub section, two examples are presented to show 
their propagation features. Table [l] shows the brief information 
about the examples and their themes are both about hot social 
issues. 



The coverage of a weibo is defined as the number of times 
people read it. Table |l] also displays the forwarding number 
and the coverage of the examples. The number of forwarded 
weibos is counted according to their levels and delays and it 
is plotted in Figure [12] Both two examples cause large-scale 
propagation and cover large amount of users. Figure [12] also 
shows different propagation patterns. In Figure [12]a), most 
weibos are concentrated at the position, where level is close to 
one. They are forwarded directly from the source weibo. While in 
Figure [12]b) there exists some small wrinkles in the middle area. 
They are small propagation trends led by some participants. The 
propagation patterns actually related to who posts the source 
weibo. The source user of the first example is a grassroots 
commentator. But the source user of the second example is a 
well-known actor and writer and he is acquainted with influential 
people who can help him in the propagation. The subject of 
context here hasn't been considered but involving context will 
give an accurate description of the propagation pattern, which 
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TABLE I 

Two EXAMPLES OF WEIBO PROPAGATION. 



Example 


Source user 


Weibo theme 


Forwarding number 


Coverage 


Followers 


1 


A grassroots user popular with his 
sharp comments on social issues 


Comments on Chinese sailors seized 
by North Korea 


17507 


34.4 million 


4 million 


2 


An actor and writer 


Boycott a well-known milk brand 
because of substandard products 


21835 


39.7 million 


1 million 



10 



^ 10= 



10" 



• Example 1 
o Example 2 



• • o 



oa'^o O 00° ° o 



10 10 10 10 10 

Number of followers 



Fig. 13. Critical users in propagation. 



may be effective to classify the tweets or weibos. 

B. Critical Users in the Propagation 

In the propagation of a weibo, there are always some users 
who can lead relatively large secondary propagation besides the 
source user. They are critical users in the propagation. It is found 
that critical users have the ability to influence their followers to 
participate in the propagation no matter they are the source user 
or not. 

Sina Weibo records the forwarding number of a weibo in the 
following way. Every weibo in the propagation has a forwarding 
number. The forwarding number of the source weibo includes the 
directly and indirectly forwarded weibos, while the forwarding 
number of others only includes the directly forwarded weibos. 
Assuming that user A posts the source weibo a, user B forwards 
weibo a from A and posts a forwarding weibo b. User C forwards 
h then C's forwarding weibo, say c, will be counted in B's and 
A's forwarding number. But if user D now forwards weibo c, D's 
forwarding weibo will increase A's and C's forwarding number 
other than B's. Whether a user is critical in the propagation is 
decided by the number of directly forwarded weibos. 

Figure [13] shows the users who have more than 10 directly 
forwarded weibos. The positive correlation between the number 
of followers and the number of directly forwarded weibos 
indicates that users with large number of followers play critical 
role in the propagation even though they are not the source user. 

VII. Conclusions 

This paper proposes a comparative study of users' following 
preference of Sina Weibo and Twitter. 

• Power-law degree distributions, two-stage power-law distri- 
bution of weibos, and the positive correlation between the 
number of followers and the number of weibos are found. 
The average distance and effective diameter is both very 
short for the size of Sina Weibo and Twitter. 

• If only friends' relations are considered, both Sina Weibo 
and Twitter exhibit some level of homophily. Sina Weibo has 



a lower reciprocity rate, more positive balanced relations 
and is more disassortative than Twitter. Coinciding with 
Asian traditional culture, the following preference of Sina 
Weibo users is more concentrated and hierarchical: they 
are more likely to follow people at higher or the same social 
levels, and less likely to follow people lower than themselves. 
In contrary, the same kind of following preference is much 
weaker in Twitter, whose users are open as they follow 
people from various levels. The following preference derives 
from not only the usage habits but also underlying reasons 
such as personalities and social moralities, which is worthy 
of future research. 

• Positive correlation between the number of followers and 
PageRank exists in both Sina Weibo and Twitter. Ranking 
correlation reflects that the following preference of Sina 
Weibo users is more concentrated while that of Twitter users 
is more diverse. 

• Propagation levels of hot weibos are small. The hot trends 
disappear fast but the coverage can be quite large. It is 
found that the propagation patterns are related with the 
source users and the patterns may be effective to classify 
micro-blogs. It is also found that critical users, whose weibos 
have large forwarding number, usually have many followers. 

The comparison between Sina Weibo and Twitter users' 
behavior provides the researchers with one excellent case to study 
culture differences between China and the West. This work is 
the first comparative study focusing on the following behavior 
using both large-scale data set of a global and a Chinese local 
online social networks. 
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